Code
library(tidyverse)
library(here)
library(readr)
library(knitr)
library(kableExtra)
library(DT)
library(broom)
library(car)
library(patchwork)
library(gganimate)
library(gifski)
library(plotly)High-tech exports, as taken from the World Bank metadata glossary, are “products with high R&D intensity, such as in aerospace, computers, pharmaceuticals, scientific instruments, and electrical machinery.” Microchips are a great example of high-tech exports as they are capital-intensive and require specialized knowledge/patents to produce. There are, however, prerequisites to being able to develop the robust infrastructure necessary to manufacture these kinds of products. A country needs strong legal protections, ease of transportation, reliable power grids, and access to highly specialized human capital. Naturally, countries capable of overcoming these prerequisites are keenly placed to attract public and private investment.
Foreign Direct Investment Inflows as a % of the receiving country’s GDP (FDI Inflows) will be the form of investment that we focus on to understand the relationship between a country’s high-tech production potential and its “investability”. The purpose of FDI for an investing party is to financially or logistically leverage their interest for either monetary or capital gain. Particularly, the interested party may want to expand their current operations in another country through mergers or acquisitions. Furthermore, the interested party may simply want to invest in a foreign company to utilize that company’s expertise in an unfamiliar market to the investor. To note, investors in this category are typically firms or other countries, e.g. China’s One Belt One Road initiative.
Given the obvious perceived relationship between FDI and high-tech exports, we hypothesize that as a county’s potential to manufacture high-tech products increases (measured as a percentage of a country’s total manufactured exports), that country will see an increase in FDI inflows (measured as a percent of country’s GDP). The data from which we draw our conclusions come from the World Bank, however, this data was compiled by an intermediary, Gapminder, an apolitical, nonprofit Swedish analytics and statistical advisory organization.
The “Exports” data set that was used involved high-technology exports (as a % of total production exports).
The “FDI” data set we used was foreign investment inflows (direct, net % of GDP).
Country: Country
Year: Year (2006-2019)
Exports: High-tech exports (% of manufactured exports)
FDI: (Foreign Investment inflows - disinvestment)/GDP
Ho: High-tech exports do not have an effect on FDI inflows.
Ha: High-tech exports has a positive effect on FDI inflows.
library(tidyverse)
library(here)
library(readr)
library(knitr)
library(kableExtra)
library(DT)
library(broom)
library(car)
library(patchwork)
library(gganimate)
library(gifski)
library(plotly)FDI <- read_csv("foreign_direct_investment_net_inflows_percent_of_gdp.csv")
Exports <- read_csv("high_technology_exports_percent_of_manufactured_exports.csv")After reading in the data, we saw that the raw data sets provided by Gapminder initially had columns dedicated to each year of recorded data. To make the data tidy, we condensed each year’s data into one column, where a specific combination of one country-year-variable constituted one row. Now that our data sets have met the tidy requirements, we can now move on to cleaning out irrelevant data and conjoining the FDI and Export data sets into one.
FDI_long <- FDI |>
pivot_longer(cols = "1969":"2019",
names_to = "Year",
values_to = "FDI") |>
mutate(Year = as.numeric(Year),
FDI = as.numeric(FDI))
invisible(datatable(
FDI_long,
caption = "Foreign Direct Investment Inflows by Year and Country:",
rownames = FALSE,
class = 'cell-border stripe',
style = 'default'
))
exports_long <- Exports |>
mutate("2008" = as.numeric(`2008`),
"2010" = as.numeric(`2010`),
"2016" = as.numeric(`2016`)) |>
pivot_longer(cols = "2006":"2019",
names_to = "Year",
values_to = "Exp") |>
mutate(Year = as.numeric(Year),
Exp = as.numeric(Exp))
invisible(datatable(
exports_long,
caption = "High-Tech Exports Direct by Year and Country:",
rownames = FALSE,
class = 'cell-border stripe',
style = 'default'
))We filtered out the low-income and low-middle-income countries as categorized by the World Bank. These are countries that have minimal to no capacity to produce high-tech exports and have therefore been excluded from the data set to more accurately analyze the relationship between high-tech exports and foreign direct investment inflows. Furthermore, we omitted all NA values, seeing as no relationship can be observed if either FDI or Exp is empty. These restrictions removed 583 observations. After joining together the FDI inflows and high-tech exports data sets and filtering out low and low-middle-income countries, there are now 1,527 observations. We can now proceed to preliminary data visualizations.
Note: The data collected on FDI Inflows went as far back as 1969, however, our joint data (seen later) was constrained by the High Tech Exports data set. Information collected on high-tech exports only started in 2006.
low_lowmid <- c("Afghanistan",
"Burkina Faso",
"Burundi",
"Central African Republic",
"Chad",
"Congo, Dem. Rep",
"Eritrea",
"Ethiopia",
"Gambia, The",
"Guinea",
"Guinea-Bissau",
"India",
"Kenya",
"Korea, Dem. Rep.",
"Liberia",
"Madagascar",
"Malawi",
"Mali",
"Mozambique",
"Niger",
"Philippines",
"Rwanda",
"Samoa",
"Senegal",
"Sierra Leone",
"Solomon Islands",
"Somalia",
"South Sudan",
"Sri Lanka",
"Sudan",
"Syrian Arab Republic",
"Tanzania",
"Togo",
"Uganda",
"Ukraine",
"Vanuatu",
"Vietnam",
"Yemen",
"Zambia",
"Zimbabwe")
joint <- right_join(FDI_long, exports_long) |>
filter(!(country %in% low_lowmid)) |>
na.omit()
invisible(datatable(
joint,
filter = 'top',
caption = "Foreign Direct Investment Inflows (FDI) and Exports by Country & Year:",
rownames = FALSE,
class = 'cell-border stripe',
style = 'default',
colnames = c("Country" = "country",
"Year" = "Year",
"FDI" = "FDI",
"Exports" = "Exp")
))joint |>
ggplot(aes(x = sqrt(Exp),
y = sqrt(FDI))) +
geom_jitter(alpha = 0.3) +
geom_smooth(method = "lm") +
theme_bw() +
labs(y = "",
x = "High-Tech Exports (Square Root of % of Manufactured Exports)",
subtitle = "Foreign Investment Inflows (Direct, Square Root of Net % of GDP)",
title = "Linear Regression of FDI and High-Tech Exports") In Figure 1, a linear regression (with square root transformations applied to tighten the spread of both variables) between high-tech exports and FDI inflows is depicted. There is a slight positive linear slope indicating that as % of high-tech manufactured exports increases, % of direct FDI inflows will also increase. However, there appear to be multiple outliers (across both variables) which are positively skewing the regression.
Many of the observed outliers came from just a handful of countries. Most notably, Cyprus and Malta. These two, unique island countries benefit from having highly educated, English-speaking populations, strong legal systems (which were taken from their ex-colonizer, the United Kingdom), and enticing tax incentives. This combination of factors (including the wonderful, year-round weather) positioned these small economies in such a way that allowed them to take in >100% of their respective GDPs in FDI numerous times between the years 2006-2019.
And just within these two countries can the beginnings of a positive relationship between high-tech exports and foreign direct investment inflows be seen over time. Inversely, years of proportionately larger levels of high-tech exportation followed by years with lesser levels saw subsequent declines in foreign direct investment, e.g. Cyprus 2011-2012.
animate_joint <- joint |>
na.omit() |>
group_by(Year) |>
summarise(across(.cols = c(FDI, Exp), mean)) |>
ggplot(aes(x = Year)) +
geom_line(aes(y = Exp, color = "#BC066A"), linetype = "solid" ) +
geom_line(aes(y = FDI, color = "#3DC1AC"), linetype = "solid" ) +
scale_color_manual(values = c("#BC066A", "#3DC1AC"),
labels = c("High-Tech Exports", "Foreign Investment Inflows")) +
labs(y = "",
subtitle = "High-Tech Exports (% of manufactured exports)
Foreign Investment Inflows (direct, net % of GDP)",
x = "Year",
title = "Average Global FDI Inflows and High-Tech Exports Over Year:",
color = "Legend") +
theme_bw() +
theme_minimal() +
transition_reveal(Year)
animate(animate_joint, renderer = gifski_renderer())Figure 2 depicts a time series plot between the years of 2006 to 2019 of the average global % of high-tech exports manufactured and % of direct FDI inflows of GDP. Initially, the global averages of high-tech exports and FDI inflows seem to maintain a roughly positive relationship with the exception of 2010-2011. Foreign investment inflows should rather be understood as lagging positively to the volatility of global average high-tech exports, as seen between 2012-2016. This lagging effect can be explained by the stickiness of contractual obligations in investments. Regardless of economic conditions, investment obligations must be fulfilled which can cause FDI inflows to be increasing as global production of high-tech exports is decreasing. On another note, the noticeable decrease in High-tech exports between the years 2006 to 2010 is likely due to the Great Recession and its stymieing effects on the global economy. Although high-tech exports’ various peaks and troughs likely project a weak R-Squared value for the following linear regression below, we still believe there is merit in analyzing the relationship between these two economic indicators. Before we are able to start producing our linear model, we must first check model assumptions to verify our potential model’s validity.
Given Figure 1, a linear model initially appears viable, particularly if the outliers were to be ignored.
aug_lm <- lm(sqrt(FDI) ~ sqrt(Exp), data = joint)
aug_joint <- augment(aug_lm)
aug <- aug_joint |>
ggplot(aes(x = .fitted,
y = .resid)) +
geom_point(alpha = .3) +
geom_line(y = 0, color = "blue") +
labs(x = "Fitted Residuals",
y = "",
subtitle = "Residuals",
title = "Versus Fits for FDI and High-Tech Exports") +
theme_bw() +
theme_minimal()
augEven after applying a square root transformation to both the dependent (FDI inflows) and independent (high tech exports), the assumption of equal variance is clearly violated in Figure 3. The spread of the residuals is not equal across all fitted values, therefore we must be cautious in drawing conclusions from our linear model across all possible high-tech export percentages.
ggplot(data = aug_lm, aes(x = sqrt(aug_lm$residuals))) +
geom_histogram(fill = 'steelblue', color = 'black') +
labs(title = 'Histogram of Residuals',
x = 'Residuals',
subtitle = 'Frequency',
y = '') +
theme_bw() +
theme_minimal()
regression <- lm(sqrt(FDI) ~ sqrt(Exp), data = joint)
normality_check <- aov(regression, var.equal = FALSE)
#Shapiro-Wilks Visual Normality Test
#qq_plot <- qqPlot(normality_check$residuals,
# id = FALSE,
# main = "Normal Q-Q Plot of Residuals",
# xlab = "Normal Quantiles",
# ylab = "Normality Check Residuals")Furthermore, in Figure 4 (again, assuming square root transformations in the base model), it appears that the assumption of normality of residuals is also violated, as seen in the right-tailed skew of the histogram. Furthermore, when producing a Shapiro-Wilks test (not included in report), our residuals visibly strayed from the acceptable normal range in the plot. Once again, we must be cautious when interpreting our model and when deciding whether or not the results can be used for future economic analysis.
It must also be stated that independence between observations, when working with countries, is near impossible. For example, political instability or repressive regimes found in the Middle East do not solely impact only their own country, but also the countries around them. This kind of political instability deters investors from allocating resources to such regions in fear of their investments being seized, destroyed, or in ways becoming unrecoverable.
Regrettably, our data violates multiple of the assumptions listed above, meaning our model should not be applied for future economic analysis. The model will also fail to be an accurate predictor of Foreign Direct Investment Inflows given high tech production for countries within our data set.
The statistical method used was a linear regression with a square root transformation on both the high-tech export variable and FDI inflows variable. By using this transformation, we were able to describe a linear relationship between the two variables while also controlling for highly skewed data points. This method was used to easily identify a relationship between these two variables and because it can result in relatively small standard errors by minimizing the sum of squares errors.
regression <- lm(sqrt(FDI) ~ sqrt(Exp), data = joint)
reg <- coef(summary(regression))
rownames(reg) <- c("$\\beta_0$", "$\\beta_1$")
knitr::kable(reg,
format = "html",
caption = "Linear Regression of FDI Inflows and High-Tech Exports",
escape = FALSE) |>
kable_styling(font_size = 15)| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| \(\beta_0\) | 1.5943667 | 0.0782904 | 20.364789 | 0 |
| \(\beta_1\) | 0.2065562 | 0.0238794 | 8.649975 | 0 |
Table 1
Based on Table 1, our estimated regression equation is \(\widehat{Y} = \beta_0 + \beta_1(X)\) or \(\widehat{\sqrt{FDI}} = 1.594 + 0.206(\sqrt{HT Exports})\).
The relationship between Foreign Direct Investment Inflows and High-Tech Exports can be described as such: every one percent increase in the square root of a country’s high-tech exports (explanatory) leads to an average 0.207 increase in the square root of net % of GDP in foreign direct investment inflows (dependent).
Regardless of the aforementioned violations in equal variances and normality of residuals, we will cautiously produce an analysis of variance on our linear model to help us with drawing conclusions on the relationship between high-tech exports and FDI inflows.
anova <- anova(regression)
SSModel <- anova$`Sum Sq`[1]
SSError <- anova$`Sum Sq`[2]
SSTotal <- SSModel + SSError
Description <- c("Variance in the Response Variable", "Variance of the Fitted Values", "Variance in the Residuals")
Sum_of_Squares <- c(SSTotal, SSModel, SSError)
variance_table <- data.frame(Description, Sum_of_Squares) |>
pivot_wider(names_from = Description,
values_from = Sum_of_Squares)
knitr::kable(variance_table,
format = "html",
caption = "Analysis of Variance on Simple Linear Regression of Hightech Exports and FDI Inflows",
escape = FALSE) |>
kable_styling(font_size = 15, position = "center")| Variance in the Response Variable | Variance of the Fitted Values | Variance in the Residuals |
|---|---|---|
| 4022.938 | 188.1488 | 3834.79 |
Table 2
The linear model explains 4.67% of the variation in the response (sqrt(FDI inflows)) ((\(\frac{188.1488}{4022.938}\)) = 0.046769, values taken from Table 2). This suggests that the quality of our model is poor at explaining any variation in FDI inflows. Approximately 95% of the variation in the response has been left unexplained ((\(\frac{3834.79}{4022.938}\)) = 0.9532); therefore more variability in the response could be explained by adding additional relevant covariates such as other non-high tech products or services, tax rates on investments, favorable political climates (e.g. stricter vs. relaxed labor laws), and more. Nevertheless, this result is unsurprising seeing as a country’s capacity to manufacture high-tech products typically is not the sole motivator for foreign investment towards another country or its firms.
While countries that emphasized high-tech production were typically rewarded with more foreign direct investment, other macroeconomic variables should be included within the model to increase the model’s R-Squared value. Furthermore, simulating our data and comparing our observed results to our simulated output (seen below) will show that not only may high-tech exports not be a strong explanatory variable but that a linear model is most likely not the ideal choice given the spread of the observed data.
To aid our understanding of the strength of the relationship seen in our model (or lack thereof), we will run many simulations randomizing the residuals of model-predicted FDI values. In Figures 5 and 6, we compared the distribution of our observed FDI values (transformations included) to the simulated, normal distribution of FDI values. The obvious skewness in the observed data differed significantly in shape when compared to the approximately normal distribution of the simulated data (once again, produced from the observed regression model), highlighting how our model is likely not a good fit for our data. This insight was further reinforced by a scatter plot of the simulated data (not included in the report) where the data points clustered in such a way that failed to depict any obvious relationship (looked nearly indistinguishable from a shotgun blast).
set.seed(10)
predict_regression <- predict(regression)
sigma_regression <- sigma(regression)
#random generation of residuals to be added to the predicted FDI values produced from plugging our observed exps into our regression model
noise <- function(x, mean = 0, sd){
x + rnorm(length(x),
mean,
sd)
}
#creates tibble/vector of simulated FDI given our model predicted FDI values
simulated_reg <- tibble(sim_FDI = noise(predict_regression,
sd = sigma_regression)
)#observed FDI and count
observed_response <- joint |>
ggplot(aes(x = sqrt(FDI))) +
geom_histogram(fill = "#3DC1AC",
color = "black") +
theme_bw() +
theme_minimal()
obs_plot <- ggplotly(observed_response)
sim_response <- simulated_reg |>
ggplot(aes(x = sim_FDI)) +
geom_histogram(fill = "#3DC1AC",
color = "black") +
theme_bw() +
theme_minimal()
sim_plot <- ggplotly(sim_response)
both <- subplot(obs_plot, sim_plot) %>%
layout(xaxis = list(title = "Square Root of FDI"),
xaxis2 = list(title = "Square Root of FDI"),
annotations = list(
list(x = 0.2,
y = 1.0,
text = "Observed Data Frequencies",
showarrow = FALSE,
xref='paper',
yref='paper',
align = "left",
showarrow = FALSE,
font = list(size = 12)),
list(x = 0.8,
y = 1.0,
text = "Simulated Data Frequencies",
showarrow = FALSE,
xref='paper',
yref='paper',
align = "right",
font = list(size = 12))))
bothFigures 5 and 6
#have joint data represent our transformation
joint <- joint |>
mutate(FDI_sqrt = sqrt(FDI),
Exp_sqrt = sqrt(Exp))
#creates data frame with both observed and simulated FDI
sim_data <- joint |>
filter(!is.na(FDI_sqrt),
!is.na(Exp_sqrt)
) |>
select(FDI_sqrt, Exp_sqrt) |>
bind_cols(simulated_reg)
#Observed dot plot describing relationship between both observed variables
observed_pred_plot <- joint |>
ggplot(aes(x = Exp_sqrt,
y = FDI_sqrt)) +
geom_point() +
labs(x = "Square Root of High-tech Exports \n(% of manufactured exports)",
y = "",
subtitle = "Square Root of FDI (direct, net % of GDP)",
title = "Observed FDI") +
theme_bw() +
theme_minimal() +
transition_time(Exp_sqrt) +
ease_aes('linear') +
shadow_trail()
invisible(animate(observed_pred_plot, renderer = gifski_renderer()))
#Simulated FDI dot plot with observed exp data
simulated_pred_plot <- sim_data |>
ggplot(aes(x = sim_FDI,
y = Exp_sqrt)
) +
geom_point() +
labs(x = "Square Root of High-tech Exports \n(% of manufactured exports)",
y = "",
subtitle = "Square Root of FDI (direct, net % of GDP)",
title = "Simulated FDI"
) +
theme_bw() +
theme_minimal() +
transition_time(sim_FDI) +
ease_aes('linear') +
shadow_trail()
invisible(animate(simulated_pred_plot, renderer = gifski_renderer()))
#Dot plot of Observed FDI vs. Simulated FDI
invisible(sim_data |>
ggplot(aes(x = sim_FDI,
y = FDI_sqrt)
) +
geom_point() +
theme_bw() +
theme_minimal() +
labs(x = "Simulated Square Root of FDI \n(direct, net % of GDP)",
y = "",
subtitle = "Observed Square Root of FDI \n(direct, net % of GDP)",
title = "FDI (direct, net % of GDP) Simulated and Observed") +
geom_abline(slope = 1,
intercept = 0,
color = "#3DC1AC",
linetype = "dashed",
lwd = 1.5))Unfortunately, performing a linear regression between the observed and simulated FDI only further showed the inadequacy of our model to accurately predict direct foreign investment. When regressing the two, a near zero R-squared value of 0.00149 is produced, implying that almost no variability in the observed FDI could be described by the simulated FDI. After simulating many regression models between the observed FDI and many simulated FDI values, we summarized the results of these regressions by counting the frequency of different R-squared values in a histogram. We discovered that an overwhelming majority (\(\frac{97}{100}\)) of these simulated regressions in Figure 7 report R-squared values of less than 0.01. Thus, we conclude that our simulated data under this linear model are almost entirely different from what was observed.
#after running a linear model between our observed and simulated response variable, we pull out just the R-squared
sim_R_Sq <- lm(FDI_sqrt ~ sim_FDI,
data = sim_data) |>
glance() |>
select(r.squared) |>
pull()
nsims <- 100
sims <- map_dfc(.x = 1:nsims,
.f = ~ tibble(sim = noise(predict_regression,
sd = sigma_regression)
)
)
#replace the "..." originally produced in all of the sims with "_"
colnames(sims) <- colnames(sims) |>
str_replace(pattern = "\\.\\.\\.",
replace = "_")
invisible(datatable(sims,
caption = "1,000 New Simulated Observations",
rownames = FALSE,
class = "cell-border stripe",
style = "default"
))
sims <- joint |>
filter(!is.na(FDI_sqrt),
!is.na(Exp_sqrt)) |>
select(FDI_sqrt) |>
bind_cols(sims)
sim_r <- sims |>
map(~ lm(FDI_sqrt ~ .x, data = sims)) |>
map(glance) |>
map_dbl(~.x$r.squared)
sim_r <- sim_r[names(sim_r) != "FDI_sqrt"]Figure 7
In this report, we have conducted a preliminary linear analysis of the relationship between a country’s high-tech product exports (as % of all production exports) and the country’s foreign direct investment inflows (as a % of GDP). Although our initial desire was to draw a relationship between the technological manufacturing capacity of a country to its foreign investment inflows, our model proved problematic in most stages of analysis.
Without passing assumptions of independence, normality of residuals, or equal variances, the validity of our model was already in question. After fitting a linear model, we saw a statistically significant relationship (after applying square root transformations) between high-tech exports and foreign direct investment inflows, yet the amount of variability our model explained was minuscule.
Should anyone replicate our analysis of this relationship, we would implore them to add additional covariates to avoid confounding variables or to decide on a different statistical method altogether. In retrospect, removing the outliers (such as Malta or Cyprus) would have proven beneficial and likely increased the amount of variability explained in our model. Furthermore, more data trimming should be considered by adding another restriction (not including our limitation of using only middle-income countries and wealthier) to cut down on the hundreds of observations which sat roughly between 1-9% on high-tech exports and 1-9% of foreign direct investment inflows.
We still believe there is merit in exploring the relationship between high-tech production capacity and a country’s perceived “investability”, but urge that any models doing so must be far more complicated and robust.
https://www.oecd-ilibrary.org/finance-and-investment/foreign-direct-investment-fdi/indicator-group/english_9a523b18-en#:~:text=Foreign%20direct%20investment%20(FDI)%20is,enterprise%20resident%20in%20another%20economy
https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups
https://www.weforum.org/agenda/2022/01/least-developed-countries-ldc-technology/
https://statsandr.com/blog/anova-in-r/#normality –> helped in producing the shapiro-wilks test
https://www.bizjournals.com/boston/blog/mass-high-tech/2008/09/mass-high-tech-exports-decrease-almost-10.html
https://www.investmentmonitor.ai/features/fdi-drivers-and-political-stability/
https://databank.worldbank.org/metadataglossary/world-development-indicators/series/TX.VAL.TECH.MF.ZS
https://www.cnbc.com/2021/03/16/2-charts-show-how-much-the-world-depends-on-taiwan-for-semiconductors.html
https://www.csis.org/blogs/trustee-china-hand/data-dive-private-sector-drives-growth-chinas-high-tech-exports
https://www.the-american-interest.com/2019/10/22/how-china-rode-the-foreign-technology-wave/
https://bookdown.org/yihui/rmarkdown-cookbook/hide-one.html
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/invisible
https://plotly.com/ggplot2/geom_histogram/
https://plotly.com/r/subplots/
https://www.statpower.net/Content/310/R%20Stuff/SampleMarkdown.html
https://plotly.com/r/figure-labels/